Introduction

This the coursework for MA331 by Speaker Luma Mufleh and Speaker Karen Armstrong.The topics that given by the speaker are:

  1. Luma Mufleh :Don’t feel sorry for refugees – believe in them

  2. Karen Armstrong : My wish: The Charter for Compassion

From the given speaker, we will investigate the sentiment that speaker give, either positive or negative statement, we will do that using several libraries. The aim of the report also measure the sentiment impact that given by the speeker while they are given the speech, there are several question that need to be answered by this report, that is

  1. What the most frequent words that speaker give in their speech?

  2. Is there any same words that spoke by the both of the speaker?

  3. What the most frequent words sentiment that speaker used in the speech?

Methodology

For the methodology we used several text analytics method such as Tokenization and also we used several method such as Sentiment analysis and Data Visualization to present and interpret the result, The sentiment analysis will used bing dictionary.

Result

Data Loading & Exploration

## # A tibble: 3 × 5
##   talk_id headline                    text                        speaker  views
##     <dbl> <chr>                       <chr>                       <chr>    <dbl>
## 1       1 Averting the climate crisis "Thank you so much, Chris.… Al Gore 3.27e6
## 2       7 Simplicity sells            "(Music: \"The Sound of Si… David … 1.70e6
## 3      53 Greening the ghetto         "If you're here today — an… Majora… 2.00e6

From there we can also the data, there are several columns, to give more understanding to the data, we can see the structure of the data

From the data we can see that are several type which all are characters. Next we can do some tokenization which will result to this

## # A tibble: 3 × 5
##   talk_id headline                    speaker   views word 
##     <dbl> <chr>                       <chr>     <dbl> <chr>
## 1       1 Averting the climate crisis Al Gore 3266733 thank
## 2       1 Averting the climate crisis Al Gore 3266733 much 
## 3       1 Averting the climate crisis Al Gore 3266733 chris

From the data we already did the tokenization, the tokenization is used to doing some sentiment analysis and also doing some analysis to the words of the data.

After doing some tokenization we can specified into the specific speaker and counting the frequent word that used by the speaker will show the data below

For Luma Mufleh

## # A tibble: 5 × 3
##   speaker     word        n
##   <chr>       <chr>   <int>
## 1 Luma Mufleh one        11
## 2 Luma Mufleh people     11
## 3 Luma Mufleh kids       10
## 4 Luma Mufleh refugee     9
## 5 Luma Mufleh home        8

For Karen Armstrong

## # A tibble: 5 × 3
##   speaker         word          n
##   <chr>           <chr>     <int>
## 1 Karen Armstrong people       23
## 2 Karen Armstrong religion     20
## 3 Karen Armstrong religious    18
## 4 Karen Armstrong world        14
## 5 Karen Armstrong one          13

To Easier and compared the most frequent word that talked by the speaker we can use graph to specified that

And also we can use a wordcloud to easier our findings, where the left sides is from luma and the right sides from karen

From the data, we can see that there are several word that most frequently said by the speaker, for luma for instance we can see that the most frequents word are

  1. People

  2. One

  3. Kids

And for karen the most frequent words are

  1. People

  2. Religion

  3. Religious

If we further see, there are same words that spoke by each speaker, we can also compare it as it shown by below graphs

From the data below, we can see that there are several words that are spoken by the two of the speakers, such as people and one is one of the two words that spoke frequently by both of the speakers

Sentiment Analysis

From the data we can also make a sentiment analysis to richen our findings, as the sentiment analysis are make, we find as shown below

For Luma Muflek

## # A tibble: 2 × 2
##   Sentiment Frequency
##   <chr>         <int>
## 1 negative         55
## 2 positive         34

For Karen Armstrong

## # A tibble: 2 × 2
##   Sentiment Frequency
##   <chr>         <int>
## 1 negative         60
## 2 positive         49

Next, we can compute the log odds ratio of the data which will result below

## # A tibble: 10 × 7
##    Sentiment    `Luma Mufleh` `Karen Armstrong`    OR  log_or Ci_lower Ci_upper
##    <chr>                <int>             <int> <dbl>   <dbl>    <dbl>    <dbl>
##  1 sadness                 39                29 1.57   0.449    0.952   -0.0537
##  2 fear                    44                34 1.51   0.414    0.886   -0.0577
##  3 negative                57                49 1.36   0.310    0.722   -0.101 
##  4 disgust                 17                16 1.20   0.184    0.882   -0.514 
##  5 joy                     28                33 0.953 -0.0484   0.476   -0.573 
##  6 anger                   24                29 0.929 -0.0737   0.487   -0.634 
##  7 anticipation            29                38 0.849 -0.163    0.342   -0.669 
##  8 positive                67                95 0.748 -0.291    0.0601  -0.642 
##  9 surprise                18                27 0.743 -0.297    0.317   -0.911 
## 10 trust                   38                56 0.738 -0.303    0.135   -0.742

From there we can use data visualization as shown below

From there we can see that Luma Mufleh tends to use negative words or she tends to use more negative sentiments word compared with Karen Armstrong, from the sentiment analysis, we can say that, 66% of the meaning words that said by Luma Mufleh have negative sentiment, and 55% words that used by Karen Armstrong are have a negative sentiments.

From the data also we could see that the positive ratio means the sentiment are more likely to accure, on the other hand negative log odds ratio shows opposite, from the data we could see that from both speaker negative words are more likely to accure

Conclusion

The conclusion of the analysis as we can list below:

  1. Luma Muflek are tends to used more negative words compared to Karen Armstrong

  2. There is one words that most spoke by the two speakers where the words is ‘People’

  3. From the logg odds ratio, negative words are more likely to accure